pacman::p_load(olsrr, ggstatsplot, sf, tmap, tidyverse, performance, see, sfdep, GWmodel, lubridate)Take-home Ex03
Overview and Objectives
In this take-home my aim is to prototype and evaluate the visualisation of pages and components in our Shiny application.
My responsibilities will be performing Geographically Weighted Regression (GWR) on the districts of Malaysia.
Data
Income Inequality Data: Household income inequality by district (https://data.gov.my/data-catalogue/hh_inequality_district)
Income Data: Mean and median gross monthly by district (https://data.gov.my/data-catalogue/hh_income_district)
Poverty Data: Poverty rates by district (https://data.gov.my/data-catalogue/hh_poverty_district)
Crime Data: Crime rates by district (https://data.gov.my/data-catalogue/crime_district)
Malaysia - Subnational Administrative Boundaries: (https://data.humdata.org/dataset/cod-ab-mys?)
Packages
- olsrr: Provides tools for building and validating OLS regression models with stepwise selection and diagnostics.
ggstatsplot: Extends ggplot2 with statistical tests and data visualization in a single, user-friendly syntax.
sf: Provides a standardized way to work with spatial vector data (points, lines, polygons).
tmap: Creates thematic maps with interactive and static modes for spatial visualization.
tidyverse: A collection of packages for easy data manipulation, visualization, and analysis.
performance: Offers tools to assess, validate, and compare regression models.
see: Supplies visual themes and color palettes to enhance ggplot2 visualizations.
sfdep: Provides spatial dependency analysis tools specifically for sf objects.
GWmodel: Implements geographically weighted regression (GWR) and spatial analysis methods.
lubridate: Simplifies the handling and manipulation of dates and times.
Data
Aspatial
crime <- read_csv("data/aspatial/crime_district.csv")Rows: 19152 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): state, district, category, type
dbl (1): crimes
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
inequality <- read_csv("data/aspatial/inequality_district.csv")Rows: 318 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, district
dbl (1): gini
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
income <- read_csv("data/aspatial/income_district.csv")Rows: 318 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, district
dbl (2): income_mean, income_median
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
poverty <- read_csv("data/aspatial/poverty_district.csv")Rows: 318 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, district
dbl (2): poverty_absolute, poverty_relative
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Geospatial
msia = st_read(dsn = "data/geospatial", layer = "mys_admbnda_adm2_unhcr_20210211")Reading layer `mys_admbnda_adm2_unhcr_20210211' from data source
`C:\ryanpxp\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 144 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 99.64072 ymin: 0.855001 xmax: 119.2697 ymax: 7.380556
Geodetic CRS: WGS 84
Data Wrangling
st_crs(msia)Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
msia <- msia %>%
st_transform(crs = 3168)st_crs(msia)Coordinate Reference System:
User input: EPSG:3168
wkt:
PROJCRS["Kertau (RSO) / RSO Malaya (m)",
BASEGEOGCRS["Kertau (RSO)",
DATUM["Kertau (RSO)",
ELLIPSOID["Everest 1830 (RSO 1969)",6377295.664,300.8017,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4751]],
CONVERSION["Rectified Skew Orthomorphic Malaya Grid (metre)",
METHOD["Hotine Oblique Mercator (variant A)",
ID["EPSG",9812]],
PARAMETER["Latitude of projection centre",4,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8811]],
PARAMETER["Longitude of projection centre",102.25,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8812]],
PARAMETER["Azimuth of initial line",323.0257905,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8813]],
PARAMETER["Angle from Rectified to Skew Grid",323.130102361111,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8814]],
PARAMETER["Scale factor on initial line",0.99984,
SCALEUNIT["unity",1],
ID["EPSG",8815]],
PARAMETER["False easting",804670.24,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Malaysia - West Malaysia onshore."],
BBOX[1.21,99.59,6.72,104.6]],
ID["EPSG",3168]]
str(msia)Classes 'sf' and 'data.frame': 144 obs. of 15 variables:
$ ADM2_EN : chr "Batu Pahat" "Johor Bahru" "Kluang" "Kota Tinggi" ...
$ ADM2_PCODE: chr "MY0101" "MY0102" "MY0103" "MY0104" ...
$ ADM2_REF : chr NA NA NA NA ...
$ ADM2ALT1EN: chr NA NA NA NA ...
$ ADM2ALT2EN: chr NA NA NA NA ...
$ ADM1_EN : chr "Johor" "Johor" "Johor" "Johor" ...
$ ADM1_PCODE: chr "MY01" "MY01" "MY01" "MY01" ...
$ ADM0_EN : chr "Malaysia" "Malaysia" "Malaysia" "Malaysia" ...
$ ADM0_PCODE: chr "MY" "MY" "MY" "MY" ...
$ date : Date, format: "2020-12-02" "2020-12-02" ...
$ validOn : Date, format: "2021-02-11" "2021-02-11" ...
$ validTo : Date, format: "-001-11-30" "-001-11-30" ...
$ Shape_Leng: num 1.86 1.83 2.36 3.15 1.07 ...
$ Shape_Area: num 0.161 0.0802 0.2336 0.2783 0.0615 ...
$ geometry :sfc_MULTIPOLYGON of length 144; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:1140, 1:2] 556715 556665 556609 556548 556517 ...
..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr [1:14] "ADM2_EN" "ADM2_PCODE" "ADM2_REF" "ADM2ALT1EN" ...
u_msia <- st_union(msia)
plot(u_msia)
crime# A tibble: 19,152 × 6
state district category type date crimes
<chr> <chr> <chr> <chr> <date> <dbl>
1 Malaysia All assault all 2016-01-01 22327
2 Malaysia All assault all 2017-01-01 21366
3 Malaysia All assault all 2018-01-01 16902
4 Malaysia All assault all 2019-01-01 16489
5 Malaysia All assault all 2020-01-01 13279
6 Malaysia All assault all 2021-01-01 11495
7 Malaysia All assault all 2022-01-01 10348
8 Malaysia All assault all 2023-01-01 10453
9 Malaysia All assault causing_injury 2016-01-01 5531
10 Malaysia All assault causing_injury 2017-01-01 5024
# ℹ 19,142 more rows
looking at the crime object we can see that “All” fields is included
filter out columns where district is “All”
crime_filtered <- crime %>% filter(district != "All")Convert the date column to year format and removed old date
keep only 2019 and 2022 to match the other datasets
crime_filtered <- crime_filtered %>%
mutate(year = year(date))%>%
select(-date) %>%
filter(year %in% c(2019, 2022)) income <- income %>%
mutate(year = year(date))%>%
select(-date) poverty <- poverty %>%
mutate(year = year(date))%>%
select(-date) inequality <- inequality %>%
mutate(year = year(date))%>%
select(-date) na <- crime_filtered %>%
summarise(na_district = sum(is.na(district)),
na_category = sum(is.na(category)),
na_type = sum(is.na(type)),
na_date = sum(is.na(date)),
na_crimes = sum(is.na(crimes))
)Warning: There was 1 warning in `summarise()`.
ℹ In argument: `na_date = sum(is.na(date))`.
Caused by warning in `is.na()`:
! is.na() applied to non-(list or vector) of type 'closure'
print(na)# A tibble: 1 × 5
na_district na_category na_type na_date na_crimes
<int> <int> <int> <int> <int>
1 0 0 0 0 0
na <- inequality %>%
summarise(na_district = sum(is.na(district)),
na_date = sum(is.na(date)),
na_gini = sum(is.na(gini))
)Warning: There was 1 warning in `summarise()`.
ℹ In argument: `na_date = sum(is.na(date))`.
Caused by warning in `is.na()`:
! is.na() applied to non-(list or vector) of type 'closure'
print(na)# A tibble: 1 × 3
na_district na_date na_gini
<int> <int> <int>
1 0 0 0
< do same for other dataset >
function for renaming
rename_districts <- function(data) {
data <- data %>%
mutate(district = case_when(
district %in% c("Iskandar Puteri", "Nusajaya", "Johor Bahru Selatan", "Johor Bahru Utara", "Seri Alam") ~ "Johor Bahru",
district == "Bandar Bharu" ~ "Bandar Baharu",
district %in% c("Brickfields", "Cheras", "Dang Wangi", "Sentul", "Wangsa Maju","W.P. Kuala Lumpur") ~ "WP. Kuala Lumpur",
district == "Nilai" ~ "Seremban",
district == "Cameron Highland" ~ "Cameron Highlands",
district == "Kuala Lipis" ~ "Lipis",
district %in% c("Batu Gajah", "Ipoh") ~ "Kinta",
district == "Gerik" ~ "Ulu Perak",
district == "Manjung" ~ "Manjung (Dinding)",
district == "Pangkalan Hulu" ~ "Ulu Perak",
district %in% c("Selama", "Taiping", "Larut dan Matang") ~ "Larut Dan Matang",
district == "Sungai Siput" ~ "Kuala Kangsar",
district %in% c("Tanjong Malim", "Tapah", "Bagan Datuk", "Muallim") ~ "Batang Padang",
district %in% c("Arau", "Kangar", "Padang Besar") ~ "Perlis",
state == "Pulau Pinang" & district == "Seberang Perai Selatan" ~ "S.P.Selatan",
district == "Seberang Perai Tengah" ~ "S.P. Tengah",
district == "Seberang Perai Utara" ~ "S.P. Utara",
district == "Ampang Jaya" ~ "Gombak",
district == "Kajang" ~ "Ulu Langat",
district %in% c("Pengkalan Hulu","Hulu Perak") ~ "Ulu Perak",
district == "Hulu Selangor" ~ "Ulu Selangor",
district %in% c("Klang Selatan", "Klang Utara") ~ "Klang",
district %in% c("Petaling Jaya", "Serdang", "Sg. Buloh", "Shah Alam", "Subang Jaya", "Sungai Buloh") ~ "Petaling",
district == "Kota Kinabatangan" ~ "Kinabatangan",
district == "Kota Samarahan" ~ "Samarahan",
district %in% c("Matu Daro", "Tanjung Manis") ~ "Mukah",
district == "Padawan" ~ "Kuching",
district == "Kulai" ~ "Kulaijaya",
district == "Tangkak" ~ "Ledang",
district == "Kecil Lojing" ~ "Gua Musang",
district == "Kalabakan" ~ "Tawau",
district == "Telupid" ~ "Beluran",
district == "Beluru" ~ "Miri",
district == "Bukit Mabong" ~ "Kapit",
district == "Kabong" ~ "Saratok",
district == "Maradong" ~ "Meradong",
district == "Pusa" ~ "Betong",
district == "Sebauh" ~ "Bintulu",
district == "Subis" ~ "Miri",
district == "Tebedu" ~ "Serian",
district == "Telang Usan" ~ "Marudi",
district == "Kuala Nerus" ~ "Kuala Terengganu",
TRUE ~ district
))
return(data)
}crime_filtered <- rename_districts(crime_filtered)crime_filtered <- crime_filtered %>%
group_by(state, district, category, type, year) %>%
summarize(crimes = sum(crimes), .groups = 'drop')Unique values in crime_filtered\(district that are not in msia\)ADM2_EN
geospatial_missing <- setdiff(unique(crime_filtered$district), unique(msia$ADM2_EN))
print(geospatial_missing)character(0)
geospatial_missing <- setdiff(unique(income$district), unique(msia$ADM2_EN))
print(geospatial_missing) [1] "Kulai" "Tangkak" "Kecil Lojing"
[4] "Bagan Datuk" "Hulu Perak" "Larut dan Matang"
[7] "Manjung" "Muallim" "Selama"
[10] "Seberang Perai Selatan" "Seberang Perai Tengah" "Seberang Perai Utara"
[13] "Kalabakan" "Telupid" "Beluru"
[16] "Bukit Mabong" "Kabong" "Maradong"
[19] "Pusa" "Sebauh" "Subis"
[22] "Tanjung Manis" "Tebedu" "Telang Usan"
[25] "Kuala Nerus" "W.P. Kuala Lumpur"
income <- rename_districts(income)income <- income %>%
group_by(state, district, year) %>%
summarize(
income_mean = sum(income_mean),
income_median = sum(income_median),
.groups = 'drop'
)inequality <- rename_districts(inequality)inequality <- inequality %>%
group_by(state, district, year) %>%
summarize(
inequality = sum(gini),
.groups = 'drop'
)geospatial_missing <- setdiff(unique(inequality$district), unique(msia$ADM2_EN))
print(geospatial_missing)character(0)
poverty <- rename_districts(poverty)poverty <- poverty %>%
group_by(state, district, year) %>%
summarize(
poverty_relative = sum(poverty_relative),
poverty_absolute = sum(poverty_absolute),
.groups = 'drop'
)geospatial_missing <- setdiff(unique(poverty$district), unique(msia$ADM2_EN))
print(geospatial_missing)character(0)
Start by joining crime with poverty, then add inequality and income
combined_data <- crime_filtered %>%
left_join(poverty, by = c("state", "district", "year")) %>%
left_join(inequality, by = c("state", "district", "year")) %>%
left_join(income, by = c("state", "district", "year"))msia_geometry <- msia %>%
select(1, 13:15)combined_data <- combined_data %>%
left_join(msia_geometry, by = c("district" = "ADM2_EN"))user can choose the type of crime data
combined_data_murder <- combined_data %>%
filter(type == "murder")combined_data_injury <- combined_data %>%
filter(type == "causing_injury")combined_data_all <- combined_data %>%
filter(type == "all")ggcorrmat(combined_data_murder[, 6:11])
ggcorrmat(combined_data_injury[, 6:11])
ggcorrmat(combined_data_all[, 6:11], sig.level = 0.05)
income median and mean are highly correlated since they are really similar and came from the same data set. Might be great for either of the variable to be removed for GWR.
function for generating regression model
user is able to select which variables and data set.
run_regression <- function(data, response, predictors) {
# Create formula from response and predictors
formula <- as.formula(
paste(response, "~", paste(predictors, collapse = " + "))
)
# Run the linear model
model <- lm(formula = formula, data = data)
# Return the model
return(model)
}# Define predictors as a vector of variable names
predictors <- c("poverty_relative", "poverty_absolute", "inequality", "income_mean", "income_median")
# Run the function with the specified data and predictors
murder_model <- run_regression(
data = combined_data_murder,
response = "crimes",
predictors = predictors
)
ols_regress(murder_model) Model Summary
----------------------------------------------------------------
R 0.695 RMSE 2.760
R-Squared 0.483 MSE 7.797
Adj. R-Squared 0.472 Coef. Var 132.240
Pred R-Squared 0.444 AIC 1279.749
MAE 1.805 SBC 1304.674
----------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------
Regression 1847.330 5 369.466 47.386 0.0000
Residual 1980.435 254 7.797
Total 3827.765 259
---------------------------------------------------------------------
Parameter Estimates
------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
------------------------------------------------------------------------------------------------
(Intercept) 0.165 0.582 0.283 0.777 -0.981 1.310
poverty_relative 0.029 0.020 0.104 1.509 0.133 -0.009 0.068
poverty_absolute 0.018 0.024 0.057 0.757 0.450 -0.029 0.065
inequality -22.041 3.030 -0.742 -7.274 0.000 -28.008 -16.073
income_mean 0.005 0.000 3.634 10.779 0.000 0.004 0.006
income_median -0.005 0.001 -2.757 -8.686 0.000 -0.006 -0.004
------------------------------------------------------------------------------------------------
# Define predictors as a vector of variable names
predictors <- c("poverty_relative", "poverty_absolute", "inequality", "income_mean", "income_median")
# Run the function with the specified data and predictors
all_model <- run_regression(
data = combined_data_all,
response = "crimes",
predictors = predictors
)
ols_regress(all_model) Model Summary
--------------------------------------------------------------------
R 0.573 RMSE 534.279
R-Squared 0.328 MSE 288786.035
Adj. R-Squared 0.321 Coef. Var 209.201
Pred R-Squared 0.293 AIC 8021.851
MAE 261.063 SBC 8051.627
--------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------------
Regression 72459258.257 5 14491851.651 50.182 0.0000
Residual 148436021.866 514 288786.035
Total 220895280.123 519
---------------------------------------------------------------------------
Parameter Estimates
------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
------------------------------------------------------------------------------------------------------
(Intercept) 69.790 79.162 0.882 0.378 -85.731 225.310
poverty_relative 4.736 2.654 0.099 1.785 0.075 -0.477 9.950
poverty_absolute 1.025 3.259 0.019 0.314 0.753 -5.379 7.428
inequality -3251.324 412.351 -0.645 -7.885 0.000 -4061.425 -2441.224
income_mean 0.747 0.067 3.000 11.110 0.000 0.615 0.879
income_median -0.683 0.076 -2.287 -8.996 0.000 -0.832 -0.534
------------------------------------------------------------------------------------------------------
Define the function
run_stepwise_selection <- function(model, direction = "forward", p_val = 0.05, details = FALSE) {
if (!direction %in% c("forward", "backward", "both")) {
stop("Invalid direction. Choose from 'forward', 'backward', or 'both'.")
}
stepwise_model <- switch(
direction,
"forward" = ols_step_forward_p(model, p_val = p_val, details = details),
"backward" = ols_step_backward_p(model, p_val = p_val, details = details),
"both" = ols_step_both_p(model, p_val = p_val, details = details)
)
return(stepwise_model)
}###forward
murder_fw_mlr <- run_stepwise_selection(
model = murder_model,
direction = "forward",
p_val = 0.05,
details = FALSE
)
print(murder_fw_mlr)
Stepwise Summary
------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
------------------------------------------------------------------------------
0 Base Model 1441.080 1448.202 701.849 0.00000 0.00000
1 income_mean 1374.046 1384.728 634.769 0.23319 0.23022
2 inequality 1343.382 1357.625 604.139 0.32372 0.31846
3 income_median 1279.575 1297.379 541.796 0.47494 0.46879
------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
----------------------------------------------------------------
R 0.689 RMSE 2.780
R-Squared 0.475 MSE 7.851
Adj. R-Squared 0.469 Coef. Var 132.696
Pred R-Squared 0.445 AIC 1279.575
MAE 1.853 SBC 1297.379
----------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------
Regression 1817.974 3 605.991 77.189 0.0000
Residual 2009.792 256 7.851
Total 3827.765 259
---------------------------------------------------------------------
Parameter Estimates
---------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
---------------------------------------------------------------------------------------------
(Intercept) 0.004 0.575 0.007 0.995 -1.129 1.137
income_mean 0.005 0.000 3.505 10.577 0.000 0.004 0.006
inequality -17.547 1.825 -0.591 -9.616 0.000 -21.141 -13.954
income_median -0.005 0.001 -2.710 -8.587 0.000 -0.006 -0.004
---------------------------------------------------------------------------------------------
backward
murder_bw_mlr <- run_stepwise_selection(
model = murder_model,
direction = "backward",
p_val = 0.05,
details = FALSE
)
print(murder_bw_mlr)
Stepwise Summary
---------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
---------------------------------------------------------------------------------
0 Full Model 1279.749 1304.674 542.184 0.48261 0.47243
1 poverty_absolute 1278.335 1299.699 540.699 0.48145 0.47331
2 poverty_relative 1279.575 1297.379 541.796 0.47494 0.46879
---------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
----------------------------------------------------------------
R 0.689 RMSE 2.780
R-Squared 0.475 MSE 7.851
Adj. R-Squared 0.469 Coef. Var 132.696
Pred R-Squared 0.445 AIC 1279.575
MAE 1.853 SBC 1297.379
----------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------
Regression 1817.974 3 605.991 77.189 0.0000
Residual 2009.792 256 7.851
Total 3827.765 259
---------------------------------------------------------------------
Parameter Estimates
---------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
---------------------------------------------------------------------------------------------
(Intercept) 0.004 0.575 0.007 0.995 -1.129 1.137
inequality -17.547 1.825 -0.591 -9.616 0.000 -21.141 -13.954
income_mean 0.005 0.000 3.505 10.577 0.000 0.004 0.006
income_median -0.005 0.001 -2.710 -8.587 0.000 -0.006 -0.004
---------------------------------------------------------------------------------------------
both
murder_sb_mlr <- run_stepwise_selection(
model = murder_model,
direction = "both",
p_val = 0.05,
details = FALSE
)
print(murder_sb_mlr)
Stepwise Summary
-------------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------------------
0 Base Model 1441.080 1448.202 701.849 0.00000 0.00000
1 income_mean (+) 1374.046 1384.728 634.769 0.23319 0.23022
2 inequality (+) 1343.382 1357.625 604.139 0.32372 0.31846
3 income_median (+) 1279.575 1297.379 541.796 0.47494 0.46879
4 poverty_relative (+) 1278.335 1299.699 540.699 0.48145 0.47331
-------------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
----------------------------------------------------------------
R 0.694 RMSE 2.763
R-Squared 0.481 MSE 7.784
Adj. R-Squared 0.473 Coef. Var 132.130
Pred R-Squared 0.448 AIC 1278.335
MAE 1.813 SBC 1299.699
----------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------
Regression 1842.865 4 460.716 59.188 0.0000
Residual 1984.900 255 7.784
Total 3827.765 259
---------------------------------------------------------------------
Parameter Estimates
------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
------------------------------------------------------------------------------------------------
(Intercept) 0.102 0.575 0.178 0.859 -1.031 1.236
income_mean 0.005 0.000 3.623 10.767 0.000 0.004 0.006
inequality -20.859 2.594 -0.702 -8.040 0.000 -25.968 -15.750
income_median -0.005 0.001 -2.774 -8.770 0.000 -0.006 -0.004
poverty_relative 0.033 0.019 0.119 1.788 0.075 -0.003 0.070
------------------------------------------------------------------------------------------------
metric <- compare_performance(murder_model,
murder_fw_mlr$model,
murder_bw_mlr$model,
murder_sb_mlr$model)metric$Name <- gsub(".*\\\\([a-zA-Z0-9_]+)\\\\, \\\\model\\\\.*", "\\1", metric$Name)plot(metric)
Function to create a coefficient plot
create_coef_plot <- function(model, conf.level = 0.95) {
ggcoefstats(model,
sort = "ascending",
title = "Estimated Coefficient",
conf.level = conf.level) +
theme(text = element_text(size = 16))
}create_coef_plot(murder_sb_mlr$model, conf.level = 0.95)
Checksfor multicollinearity
out <- plot(check_model(murder_sb_mlr$model,
panel = FALSE))For confidence bands, please install `qqplotr`.
out[[2]]
normality test
plot(check_normality(murder_sb_mlr$model))For confidence bands, please install `qqplotr`.

A histogram might be a better approach for user to tell the distribution at a glance instead.
ols_plot_resid_hist(murder_sb_mlr$model)
plot(check_outliers(murder_sb_mlr$model,
method = "cook"))
mlr_output <- as.data.frame(murder_sb_mlr$model$residuals) %>%
rename(`SB_MLR_RES` = `murder_sb_mlr$model$residuals`)because of empty rows we need to pad “NA”
murder_residual <- data.frame(MLR_RES = rep(NA, nrow(combined_data_murder)))
murder_residual[rownames(mlr_output), "MLR_RES"] <- mlr_output$SB_MLR_REScombined_data_murder <- cbind(combined_data_murder,
murder_residual)combined_data_murder_st <- st_as_sf(combined_data_murder)tm_shape(msia)+
tmap_options(check.and.fix = TRUE) +
tm_polygons(alpha = 0.4) +
tm_shape(combined_data_murder_st) +
tm_dots(col = "MLR_RES",
alpha = 0.6,
style="quantile") Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

plot(check_collinearity(murder_sb_mlr$model)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Variable `Component` is not in your data frame :/

while its good to show the graph above is great to show the collinearity, showing it in a map might be better as shown below
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(combined_data_murder_st) +
tm_polygons(col = "MLR_RES", alpha = 0.6)+
tm_view(set.zoom.limits = c(5, 9))Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.